154

Applications in Computer Vision

where f(·) is the nearest-neighbor interpolation. Therefore, we formulate the learning ob-

jective for feature refinement as

arg min

aL,a

H

max

WD L F

Adv(aL, a

H, WD) + L F

MSE(aL, a

H)iN,

(6.13)

where L K

Adv(wi, bwi, αi, WD) is the adversarial loss as

L F

Adv(aL, a

H, WD) = log(D(a

H; WD)) + log(1D(aL; WD)),

(6.14)

where D(·) consists of several basic blocks, each with a fully connected layer and a

LeakyReLU layer. In addition, we adopt several discriminators to refine the features during

the binarization training process.

Moreover, L F

MSE(wi, bwi, αi) is the feature loss between the low-level and high-level

features, which is expressed by MSE as

L F

MSE(aL, a

H) = μ

2 ||aLa

H||2

2,

(6.15)

where μ is a balancing hyperparameter.

6.2.4

Optimization

For a specific task, the conventional problem-dependent loss LS e.g., the cross entropy, is

considered, thus the learning objective is defined as

arg

min

wii,pi = LS(wi, αi, pi)iN,

(6.16)

where pi denotes the other parameters of BNN, e.g, parameters of BN and PReLU. There-

fore, the general learning objective of BiRe-ID is Eqs. 6.79, 6.13, and 6.16. For each convo-

lutional layer, we sequentially update wi, αi and pi.

Updating wi: Consider δwi as the gradient of the real-valued kernels wi. Thus,

δwi = L

wi

= LS

wi

+ L K

Adv

wi

+ L F

Adv

wi

+ L K

MSE

wi

+ L F

MSE

wi

.

(6.17)

During the backpropagation of softmax loss LS(wi, αi, pi), the gradients go to bwi first

and then to wi. Thus, we formulate is as

LS

wi

= LS

bwi

bwi

wi

,

(6.18)

where

bwi

wi

=

1.2 + 2wi,

1wi < 0,

22wi,

0wi < 1,

10,

otherwise,

(6.19)

which is an approximation of the 2×dirac-delta function [159]. Furthermore,

L K

Adv

wi

=

1

D(wi; WD)

∂D

wi

.

(6.20)

L K

MSE

wi

= λ(wiαibwi)αi,

(6.21)

L F

Adv

wi

=

1

1D(ai; WD)

∂D

ai

ai

wi

I(iL),

(6.22)